Development and Validation of Artificial-Intelligence-Based Radiomics Model Using Computed Tomography Features for Preoperative Risk Stratification of Gastrointestinal Stromal Tumors

Background: preoperative risk assessment of gastrointestinal stromal tumors (GISTS) is required for optimal and personalized treatment planning. Radiomics features are promising tools to predict risk assessment. The purpose of this study is to develop and validate an artificial intelligence classification algorithm, based on CT features, to define GIST’s prognosis as determined by the Miettinen classification. Methods: patients with histological diagnosis of GIST and CT studies were retrospectively enrolled. Eight morphologic and 30 texture CT features were extracted from each tumor and combined to obtain three models (morphologic, texture and combined). Data were analyzed using a machine learning classification (WEKA). For each classification process, sensitivity, specificity, accuracy and area under the curve were evaluated. Inter- and intra-reader agreement were also calculated. Results: 52 patients were evaluated. In the validation population, highest performances were obtained by the combined model (SE 85.7%, SP 90.9%, ACC 88.8%, and AUC 0.954) followed by the morphologic (SE 66.6%, SP 81.8%, ACC 76.4%, and AUC 0.742) and texture (SE 50%, SP 72.7%, ACC 64.7%, and AUC 0.613) models. Reproducibility was high of all manual evaluations. Conclusions: the AI-based radiomics model using a CT feature demonstrates good predictive performance for preoperative risk stratification of GISTs.


Introduction
Gastrointestinal stromal tumors (GISTs) are the most common mesenchymal tumors found in the gastrointestinal tract, accounting for about 2% of gastrointestinal tumors, with an incidence that has been progressively increasing over the past year. [1,2].
These tumors are derived from precursors of interstitial Cajal cells, pacemaker cells responsible for (GI) peristalsis activity. Currently, no environmental risk factor for GIST is known, but there is evidence of familial predisposition to germline oncogene mutations: KIT or PDFRA oncogene mutations are the most frequent [3].
Unlike other tumors, for which the TNM system represents the most commonly adopted staging tool, the risk stratification of GISTs is based on the Miettinen's classification, which has been recently reviewed [4]. By integrating tumor size (2 cm; >2-5 cm; >5-10 cm; 2 of 11 >10 cm), mitotic index (5/50 HPFs or >5/50 HPFs), and tumor site (stomach; duodenum; small bowel; rectum), this classification identifies five risk grades: none, very low, low, moderate, and high. The prognosis of GISTs is closely related to their risk grade. Different risk grades lead to different therapeutical options. Therefore, an adequate preoperative tumor assessment, including specimen collection and pathological examination based on microscopic morphology and immune phenotype, is mandatory to select an optimal therapeutic strategy for each patient [5].
Multidetector computed tomography (MDCT) plays a key role in GIST management, including detection, evaluation of tumor extent, and evaluation of treatment response. However, less is known about its role in risk assessment and prognostication of GISTs [6,7].
Multiple MDCT findings are helpful to establish the preoperative risk stratification of GISTs; given that preoperative biopsy for histopathological assessment is not routinely performed due to the risk of bleeding and/or seeding of the tumor, MDCT imaging findings are helpful in the preoperative risk stratification of GISTs [8,9].
In this setting, the emerging roles of artificial intelligence (AI) and radiomics offer new opportunities to forecast the tumor risk and aid in clinical decision making [10,11].
In particular, texture analysis (TA) has been increasingly applied to radiological imaging for diagnosing, characterizing, and monitoring treatment response by quantifying tumor heterogeneity and irregularity of tissue components [12,13]. Tumors with high heterogeneity have been shown to have worse prognosis, potentially reflecting intrinsic biological aggressiveness or treatment resistance [14][15][16][17][18][19].
Recently, some studies investigated whether MDCT TA features of GISTs could be used as imaging biomarkers, demonstrating its potential role in the characterization of tumor subtypes [20][21][22][23]. In these studies investigators developed different methods to extract CT features using customized software or complex AI algorithms. This complexity may limit the clinical application of such promising algorithms.
Thus, the aim of our study was to develop and validate classification models based on morphologic and texture features extracted from CT images, to predict a tumor's biology using the Miettinen's classification as a reference standard.

Study Design and Population
This retrospective, non-randomized, single center study was conducted according to the Good Clinical Practice (GCP) International Conference on Harmonization (ICH). We retrospectively selected patients with a pathological diagnosis of GIST who had undergone a multiphasic CT scan of the abdomen from May 2017 to September 2019. Inclusion criteria were (1) histopathological diagnosis of GIST, and (2) surgical excision of the tumor. Exclusion criteria included (1) a poor image quality of the CT images, (2) an incomplete histopathological report, and (3) neoadjuvant chemotherapy before CT.
The study was approved by the local ethical committee. Informed consent was waived because of the retrospective nature of the study and the anonymization of clinical data.

Pathological Examinations
The histopathological diagnosis of GIST was performed by an expert pathologist with more than 20 years of experience, based on microscopic morphology and immunephenotype. Immunohistochemistry (IHC) was performed on freshly cut, 3-micron-thick, paraffin-embedded tissue sections using antibodies against C-KIT/CD117 (Dako A 4502, polyclonal rabbit anti-human), according to manufacturer instructions. All the included cases demonstrated cytoplasmatic/membranous positivity. Mitotic count was performed on 50 HPF (high power field) and expressed as the number of mitoses/50 high power fields (HPF). Tumor size was measured on formalin-fixed samples, and expressed in cm. As per Miettinen's classification, tumors were stratified in five risk classes (no risk, very low risk, low risk, moderate risk, high risk) based on mitotic count, tumor size and location (Table 1) [4]. The five risk classes were dichotomized in two groups: a higher risk group (including moderate and high risk classes) and a lower risk group (including no risk, very low risk and low risk classes).

MDCT Acquisition Protocol
All MDCT scans were acquired with a 16 raw scanner (LighSpeed 16 slice, GE Medical Systems, Waukesha, WI, USA). All acquisitions were performed in the cranio-caudal direction form the diaphragmatic dome to the end of the ischiatic branches. Scanning parameters were as follows: kV 120; mAs 120-180; gantry rotation0.5 s; pitch 1:1; detector configuration 16 × 1.5 mm; reconstructed section thickness 2.5 mm; standard reconstruction algorithm. A portal venous phase, following unenhanced scan, was acquired after 75 s from the injection of 0.625 mL of iodine per Kg of total body weight injected at 1.6 gI/s.

Morphologic Features
For each tumor, the following features were evaluated by two independent radiologists (with more than 10 years of experience in abdominal radiology): primary tumor location, lesion margins, angiogenesis, intralesional necrosis, peritoneal effusion, peritoneal implants, degree and pattern of contrast enhancement, and invasion of adjacent organs. Radiologists were blinded to the histopathological outcome of the tumors.
The primary tumor location was classified according to the gastrointestinal tract of origin: esophagus, stomach, duodenum, jejunum, ileum, and colon. The margins of the lesions were classified as regular when the edge of the lesion appeared smooth, or irregular when they appeared jagged. The presence of angiogenesis was assessed when enlarged and engorged blood vessels, close to the lesion, were identified. The presence of internal necrosis was assessed when intratumoral low-attenuation unenhanced areas were identified. Both peritoneal effusion and implants were scored as present or absent. The density of the primary tumor was measured applying a circular ROI on unenhanced and portal venous phase images. The degree of contrast enhancement was scored as mild, in cases where the difference between enhanced density and unenhanced density was lower than 55.33 HU, and as high if it was greater or equal than the same value [24]. The enhancement pattern was classified as homogeneous or heterogeneous based on the presence of different attenuation areas within the tumor. Finally, the invasion of adjacent organs was defined as an absence of clear margins between the tumor and the adjacent structures.

Texture Features
Texture features were extracted using TexRAD, a proprietary software algorithm (TexRAD Ltd., London, UK) commercially available and equipped with a simple user interface. The feature extraction process was performed by two independent radiologists (with 10 years of experience in abdominal imaging), blinded to the histopathological results.
A region of interest (ROI) was drawn around the tumor at the level of the largest tumor area as depicted on the axial MDCT portal venous phase images. The ROI was then used for texture analysis, which comprised an image histogram technique with an initial image filtration, followed by the quantification of texture within the filtered images. The in-plane filtration step was performed by means of a Laplacian of Gaussian spatial band-pass filter to produce a series of derived images highlighting features at different spatial scaling factors (SSF), ranging from fine to coarse texture within an ROI. The scale was selected by altering the filter standard deviation parameter, or σ, between 0.0 (not filtered) and 2 (coarse texture); SSFs performed by the software were: 1 mm, 1.5 mm, 1.8 mm, and 2 mm. A value of 1 mm represented fine texture scale, a value of 1.5 mm and 1.8 mm represented medium texture scale, and 2 mm represented a coarse texture scale. Heterogeneity within each ROI was quantified with and without image filtration using the following histogram parameters: kurtosis, entropy, skewness, mean value of positive pixels (MPP), standard deviation (SD), and mean. Kurtosis, which can be positive or negative, reflects the peakedness of the histogram. Entropy is linked with the irregularity of graylevel distribution. Skewness represents and measures the asymmetry of the histogram and could be positive or negative. The mean is the average value of the pixels within the analyzed ROI. SD describes the variation, low or high, from the average (mean value). MPP represents the average brightness of positive pixel values within the image [25,26].

Machine Learning Classification
Both the morphologic and the texture features extracted from CT images were combined and analyzed using the WEKA (Waikato Environment for Knowledge Analysis, Version 3.8.5, University of Waikato, Hamilton, New Zealand) machine learning (ML) suite for data mining classification. A total of thirty-eight features were extracted: eight morphologic features and thirty texture features. The aim of this process was to identify a ML classification algorithm able to identify higher-and lower-risk patients as determined by the Miettinen's classification, which was considered the reference standard.
During the first step, patients were subdivided in two groups. Using the WEKA Explorer Filter tool, two thirds of the patients were placed in the training group, and one third in the validation group, after the population had initially been randomly reorganized. The training group was analyzed using Auto-WEKA, a dedicated package that allows the automatic identification of the best model with its best parameter settings (hyperparameter optimization) for a given classification or regression task, as well as a feature selection process. This analysis was performed separately for the eight morphologic features and the 30 texture features. Finally, the process was performed on all 38 features merged.
The optimized classification algorithms, identified by the Auto-WEKA analysis, were applied to the validation group, performing three separate analyses: morphologic features, texture features, and combined (morphologic and texture) features. For each classification model, sensitivity (SE), specificity (SP), accuracy (ACC) and area under the curve (AUC) were evaluated.

Statistical Analysis
All continuous variables were expressed as mean and standard deviation (SD). Differences in patients' sex distribution, tumor location, tumor size, mitotic rate, Miettinen's risk score and morphologic features were calculated using a χ 2 test with Yates's correction. The Student t test was calculated to find significant differences in patients' age.
The one-way ANOVA with Fisher's LSD test was used to find significant differences in texture features.
Since the features implemented in the classification algorithms were derived from manual assessments, intra-reader and inter-reader agreement were calculated. The reproducibility of the morphologic feature evaluation was calculated with the weighted Cohen's kappa (κ) analysis, while the reproducibility of the texture feature measurement was calculated using the intraclass correlation coefficient (ICC). One of the two radiologist performed all measurements twice for intra-reader agreement. Agreement was interpreted according to the following criteria: A two-tailed p < 0.05 was considered statistically significant.

Study Population
Eighty-one patients were retrospectively selected from our database. Twenty-nine patients were excluded from the analysis because of a incomplete histology report (17), low-quality CT images (4), or neoadjuvant chemotherapy before CT (8). Thus, the final study population resulted in fifty-two patients ( Figure 1).
A two-tailed p < 0.05 was considered statistically significant.
As per the reviewed Miettinen's classification, lesions were stratified as follows: 5.8% (3) no risk, 27% (14) very low risk, 25% (13) low risk, 21.1% (11) moderate risk, and 21.1% (11) high risk. Accordingly, 22 patients (42.3%) were included in the higher risk group, and 30 patients (57.7%) in the lower risk group. No statistically significant differences were observed between the higher and lower risk groups in terms of gender, age, tumor location and mitotic rate, while a significant difference was observed in tumor size. Results are summarized in Table 2.  The WEKA Explorer Filter tool randomly subdivided the population in two groups: the first group, including 31 patients (59.6%), used for model training, and the second group, made up of 21 patients (40.4%), for model testing. No statistically significant differences were observed between the training and the validation groups for any of the characteristics evaluated. Results are summarized in Table 2.

Morphologic and Texture Features
The higher and lower risk groups differed significantly for most of the morphologic features (margins, angiogenesis, necrosis, peritoneal effusion, peritoneal seeding, organ invasion, and enhancement pattern) and some of the texture features (SF0mean, SF0MPP, SF1.5SD, SF1.5MPP, SF1.8mean, SF1.8SD, SF1.8MPP, SF2mean, SF2SD, and SF2MPP). No differences were observed between the training and validation groups for both morphologic and texture features. Results are summarized in Tables 3 and 4.

Machine Learning Models Training
Among the 38 features, only 16 were selected for the development of the classification model according to the Auto-WEKA analysis (10-fold cross validation attribute evaluator "CorrelationAttributeEval"). Among the eight morphologic features, four of these were selected, including angiogenesis, necrosis, delta density, and enhancement pattern. On the other hand, 12 of the 30 texture features were selected for the model development: SF0_sd, SF0_entropy, SF0_skewnwss, SF1_mean, SF1_sd, SF1_entropy, SF15_mean, SF15_sd, SF15_entropy, SF18_mean, SF2_mean and SF2_sd.
Three models were subsequently developed: the first one based on morphologic features only (morphologic model), the second one on texture features only (texture model), and the last one using both feature classes (combined model).
The Multilayer Perceptron (MLP) classifier was identified as the best for the combined model. The hyperparameters were optimized as follows: The model performances were SE 100%, SP 95%, ACC 97.1%, and AUC 0.968.
Results are summarized in Table 5.

Discussion
The aim of this study was to develop and validate a decision support model, based on the combination of CT morphologic and texture features, to classify patients affected by GIST as higher or lower risk according to Miettinen's classification. Preoperative evaluation of risk assessment is required for optimal and personalized treatment planning [8].
According to our results, a combined model, based on morphologic and texture features, performed better than models based solely on the two feature classes separately.
Few recently published manuscripts have already investigated the potential role of radiomics in the risk assessment of GISTs. Chen T. et al. [27] evaluated a radiomic nomogram using morphologic features to predict the malignant risk of GISTs, obtaining an AUC of 0.847 [95% CI: 0.818-0.915], demonstrating that radiomics features combined with clinical data and typical CT characteristics were more effective in evaluating the malignant potential of GISTs compared to clinical data or typical CT characteristics models. Zhang L. et al. [28] also demonstrated favorable performance of a 5-CT-feature-based radiomic model in discriminating risk stratification according to Miettinen's classification, with an AUC of 0.809 (95% CI: 0.777-0.841). Wang et al. [9] also developed four different radiomic models based on morphological features extracted from arterial and venous enhanced CT scans to predict the malignancy risk of GISTs, which resulted in higher diagnostic performance compared to clinical data and/or typical CT characteristics.
The performance of the combined model obtained in the present study is in line with the performance of previous studies. The main difference between our study and the previous ones is represented by the methodological approach. In particular, inhouse-developed software and complex analyses were used in the previous studies. Such approaches may result in higher performance, thus adversely affecting reproducibility. The latter is a wellknown concern for radiomics and AI studies as confirmed by recent initiatives focused on the assessment of quality and reproducibility in this field, such as the Radiomics Quality Scores [29]. In this context, one of the major strengths of our approach is represented by the utilization of a commercially available software. Both TexRAD, for texture feature extraction, and WEKA, for machine learning algorithm development, have been widely utilized and validated, especially for oncologic imaging [14,30,31].
Unlike the previous studies, a reduced number of radiomic features, namely firstlevel texture features, were included for model building process. Although this might be considered a limitation, it should be noted that in most of the radiomic studies the feature selection process is mandatory in order to avoid overfitting issues.
Another major difference is the use of a machine learning suite (WEKA) with a simple user interface. This software is optimized for supervised machine learning analysis with a specific tool (Auto-WEKA) for feature and classifier optimization. As described in our work, different classifiers and different hyperparameters were used for the three different models. MLP emerged as the best classifier for the morphologic and combined models, even if with different hyperparameters, whereas LWL was the most accurate regarding the texture model. The model training was performed using a 10-fold cross validation method, which allowed us to partially overcome the major limit of the present study, represented by the small sample size.
Another limitation of the study is the retrospective nature of the patients' enrollment. However, in line with most of the previous similar studies, this design could be considered appropriate for preliminary results. Finally, one limitation that may be considered is the single slide (2D) tumor manual contouring. In this setting, the use of automatic tumor segmentation software or 3D feature extraction are known to be the best methods. However, the most advanced segmentation tools require specific software, therefore increasing the complexity of the analyses and hampering its implementation in clinical workflow. On the other hand, the 2D approach adopted in our study demonstrated a high reproducibility, which may be well suited for routine usage.

Conclusions
Noninvasive risk stratification of GISTs may be performed by means of a combined model, based on morphologic and texture features obtained from CT images. The proposed approach, based on commercially available software, might be considered relatively easy to perform and suitable for clinical practice if results can be confirmed in a larger population.